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1 INTRODUCTION 

Bayesian techniques for estimating the posterior distribu- 
tions of cosmological parameters are now well established 
in astronomy (see Lahav and Liddle, 2006, and references 
therein). In the last few years, cosmologists have become in- 
creasingly interested in statistical techniques for model selec- 
tion (for an early application see Jaffe 1996; for recent sum- 
maries see Liddle, Mukherjee and Parkinson 2006; Trotta 
2008). This subject has a long history and is discussed at 
length in Jeffreys' classic monograph (Jeffreys 1961). The 
aim of model selection is to provide a measure by which to 
rank competing models. A model that is highly predictive 
should clearly be favoured over a model that is not. Model 
selection, in effect, quantifies Occam's Razor by penalizing 
complicated models with many parameters that need to be 
finely tuned to match the data. A topical example of model 
selection applied to cosmology is in assessing whether ob- 
servational data favour dynamical dark energy over a cos- 
mological constant (for recent discussions see Szydlowski, 
Kurck and Krawiec 2006; Liddle, Mukherjee, Parkinson and 
Wang 2006; Sahlen, Liddle and Parkinson 2007; Serra, Heav- 
ens and Melchiorri 2007). This is the example that we will 
use in this paper. 

Models can by ranked by computing the Bayesian Evi- 
dence, E, defined as the probability of the data D given the 
model M, 



E(M) 



dOP(D\0M)n(0\M), 



where tt(0\M) is the prior distribution of model parameters 
6 and P(D\8M) is the likelihood of the parameters under 
model M. The ratio of the Evidences for two models, 



ABSTRACT 

There has been increasing interest by cosmologists in applying Bayesian techniques, 
such as Bayesian Evidence, for model selection. A typical example is in assessing 
whether observational data favour a cosmological constant over evolving dark energy. 
In this paper, the example of dark energy is used to illustrate limitations in the ap- 
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choice of model and priors. An analysis of recent cosmological data shows a statisti- 
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also known as the Bayes factor, provides a measure with 
which to discriminate between the models. If each model is 
assigned equal prior probability, the Bayes factor gives the 
posterior odds of the two models. A value for B\2 of or- 
der unity indicates that there is little to choose between the 
two models but a value of, say, B\2 ~ 0.01 suggests that the 
data strongly favour model 2 over model 1*. The compu- 
tation of Evidence can be challenging since it requires the 
evaluation of an integral (1) over the entire likelihood func- 
tion. This can take many hours of supercomputer time if 
the cost of evaluating likelihood function at a single point 
within a multi-dimensional parameter space is large. Rather 
than compute the Evidence, some authors have used prox- 
ies such as the Bayesian Information Criterion, which can 
be computed from the maximum of the likelihood function. 
The Bayesian Information Criterion (BIC) and various in- 
formation theoretic criteria for model selection are discussed 
by Liddle (2004, 2007) and by Trotta (2008); these will not 
be discussed in any detail in this paper which will focus on 
the Evidence defined by equation (1). 

The application of Bayesian Evidence to cosmology has 
not met with uniform approval. The methodology has been 
attacked vigourously recently by Linder and Miquel (2007), 
and defended even more vigourously by Liddle et al. (2007). 
This author agrees with the statistical analysis of Liddle et 
al. (2007). Nevertheless, our conclusions are more in sympa- 
thy with those of Linder and Miquel, namely that Bayesian 
Evidence is of limited use for many applications to cosmol- 
ogy- 

The main reason for reaching this conclusion is that the 
very concept of a model is subjective in many cosmological 



(1) 



Bia = E{Mi)/E[M%), 



(2) 



* Many authors have used qualitative guidelines suggested by 
Jeffreys (1961) to interpret Bayes factors, see Section 3. 
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applications. Ideally, one would like to test a physically well 
motivated model, rather than adopting a phenomenological 
parameterisation, but this is rarely possible in cosmology 
because the underlying physics is poorly understood. As is 
evident from equation (1) computation of Bayesian Evidence 
requires assumptions concerning the prior distributions of 
any parameters specifying a model. Again, in cosmology we 
rarely have strong guiding principles to help us choose priors. 
Bayesian Evidence is useful in situations where hypotheses 
are well motivated and when there are symmetries, or other 
information, to guide the choice of priors. (Specific exam- 
ples are given in the textbook by Mackay 2003). Perceived 
difficulties with 'subjective' choices of priors are, of course, 
at the heart of the long-standing debate between 'Frequen- 
tists' and 'Bayesians' (see for example, Kendall and Stuart 
1979, §21; Jaynes 2003) and this paper has nothing new to 
add to this well-worn discussion. But even if one approaches 
statistics from a Bayesian point of view, as this author does, 
the difficulties involved in defining models and priors must 
be appreciated when interpreting Bayesian Evidence. 

In this paper, I will use the problem of dark energy 
to illustrate the points outlined in the previous paragraph. 
In the next Section, I begin with a discussion of 'skater' 
models (Linder 2005; Sahlen, Liddle and Parkinson 2007) to 
demonstrate the subjectivity involved in defining a model. 
The skater parameterization is clearly unphysical and should 
be thought of as an approximation to a more complex model 
requiring more free parameters and uncertain priors. This is 
typical of many parameterizations of evolving dark energy. I 
will then discuss the constraints on simple dynamical mod- 
els of dark energy, including a 'thawing' field evolving in a 
linear potential and a 'freezing' tracker model, to illustrate 
problems associated with priors. Some general comments on 
model selection are presented in Section 3 and the conclu- 
sions are summarized in Section 4. 



2 TESTING DARK ENERGY 

The discovery that the Universe is accelerating has stimu- 
lated an enormous amount of interest in dynamical models 
of dark energy (see for example, the comprehensive review 
by Copeland, Sami and Tsujikawa 2006). A particularly sim- 
ple class of models is based on a scalar field (fr evolving in a 
potential V((fr). The equation of motion of the field is 



> + 3Hj>=-V'((fr), 



(3) 



where dots denote time derivatives, H — a/a where a is the 
scale factor, and the prime denotes a derivative with respect 
to the field value (fr. In this paper, we focus on comparing this 
class of dynamical models with the hypothesis that the dark 
energy is a cosmological constant, i.e. V = Vo = constant. 



2.1 The difficulty of denning a physically well 
motivated model. 

In a 'skating' model (Linder 2005) the potential is assumed 
to be flat, V — Vo, but the scalar field has some kinetic 
energy, 0^0. This leads to a non-trivial equation of state 
that evolves as 



What prior should we choose for (fr? The equation of motion 
gives (fr oc a~ 3 , i.e. the field velocity decays adiabatically as 
the Universe expands. So a physically well motivated prior 
for (f> would be a delta function centred around the value (fr = 
0, making the model indistinguishable from a cosmological 
constant. 

Sahlen, Liddle and Parkinson (2005, 2007) have derived 
constraints on skater models using distant supernovae and 
other cosmological data. The data do not constrain strongly 
the field velocity (fro at the present day and so the limits on 
(fro simply reflect the maximum value allowed by their choice 
of prior on the kinetic contribution to the cosmic density at 
high redshift. It can be argued that if the likelihood func- 
tion is flat over the full domain of a parameter, then the 
Evidence is independent of the prior (Liddle et al. 2007). 
But how does one specify the domain of a parameter? As 
discussed in the next Section, if the likelihood does vary over 
the parameter range, perhaps because the data have been 
used to suggest the range, then the posterior distributions 
of other parameters, such as Vo, and the Evidence (1) will 
depend on the choice of prior. 

In fact, the problem is more serious than outlined above. 
We have described the skater model here because it is easy 
to see that it is a proxy for a more complicated model involv- 
ing more parameters (and priors) . This is true of many sim- 
ple parameterizations of dynamical dark energy. How could 
skating behaviour be realized in practice? In the following 
example t 



M 4+a 



(5) 



dlna ^ 



W ). 



(4) 



the first term in (5) is a 'tracker' potential with an attractor 
solution (Steinhardt, Wang and Zlatev 1999). It is therefore 
easy to arrange for the field to follow the attractor solution 
at high redshift and then glide on to the constant part of 
the potential with some finite (fr at low redshift. As a spe- 
cific example, assume V = 2H%, a = 4, M 4+a = 0.05#o, 
then at the present day O,^ = 0.71, w^ — —0.97 and 
(fro/ Ho = 0.246 (actually outside the range on (fro/ Ho permit- 
ted by the priors assumed by Sahlen et al. 2007). We would 
argue that equation (5) with attractor initial conditions is 
a physically better motivated model than a simple flat po- 
tential with some arbitrary choice of initial (fr. Of course, 
this is a more complicated 'skater-like' model and requires 
the specification of priors on three parameters M, a and Vo- 
Furthermore, as the above example shows, for reasonable 
values of a it is difficult to get a substantial deviation from 
w^ = — 1. The additional parameters therefore allow mod- 
els that show deviations from the dynamics of a cosmological 
constant at low redshift, but the differences are small. 



2.2 Dependence on priors 

To make contact with previous work, we first analyse the 
simple phenomenological model with a constant equation of 
state parameter wo- Evidence computations for this model 



t We use natural units, c = h = 1. The reduced Planck mass 
is M pl = (SttG)- 1 ' 2 = 2.44 X 10 18 GcV and will be set to unity 
unless explicitly stated otherwise. 
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have been presented by Liddle et al. (2006) and Serra et al. 
(2007), using broadly similar data to those used here. 

We use a compilation from the following web ad- 
dress http://braGburn.pha.jhu.edu/~arioss/R06/Davis07_R07.WV07.dat, 

listing redshifts, distance moduli and their errors for Type 
la supernovae. These data were used in Davis et al. (2007) 
and consist of combined data from Wood-Vasey et al. (2007) 
and Riess et al. (2007). Supernovae with redshifts less than 
0.02 were discarded to limit systematic errors associated 
with local peculiar velocities, leaving 181 supernovae with 
a maximum redshift of 1.755 (snl997ff). Following Sahlen 
et al. (2007), in addition to the constraints on luminosity 
distances from Type la supernovae, we add constraints on 
the CMB peak shift parameter 1Z at the redshift of decou- 
pling and on the baryon acoustic scale parameter A at the 
characteristic depth of the Sloan Digital Sky Survey (Eisen- 
stein et al. 2005), assuming Gaussian distributions with 

K{z dec = 1089) = 1.70 ± 0.03, (6a) 
A(z = 0.35) = 0.474 ±0.017. (6b) 

For definitions of the parameters 1Z and A, and references to 
the numerical values listed in (6a, 6b), we refer the reader 
to Wang and Mukharjee (2006) and Sahlen et al. (2007). 

Spatial curvature is assumed to be zero, thus a model 
is specified by the Hubble parameter h (in units of 
100 kms _1 Mpc _1 , the cosmological matter density parame- 
ter at the present day, f_ m , and the equation of state parame- 
ter wo- To facilitate comparison with Serra et al. (2007), we 
adopt the identical flat priors for £l m and h with ranges 
0.1 < n m < 0.5, 0.56 < h < 0.72 (and we do not at- 
tempt to justify these choices). For wo we will adopt a flat 
prior over the range —1 < wo < —1/3, i.e. excluding the 
'phantom' regime wo < —1, and a flat prior over the range 
-2 < w < -1/3. 

Figure 1 shows likelihood contours in the wo — f-m plane 
marginalised over the Hubble parameter h. The results are 
broadly compatible with the analysis presented by Serra et 
al., though the supernova sample used here is larger and so 
the contours in Figure 1 are somewhat tighter than theirs. 
Note that the peak of the likelihood function is close to the 
cosmological constant value wo = —1. There is no evidence 
of a shift of the contours below the phantom divide line 
seen in some earlier analyses {e.g. Riess et al., 2004). There 
is evidence that the older High-z Supernovae Search Team 
(HZSST) data pull the solutions to wo < — 1 (see Nesseris 
and Perivolaroploulos 2007) indicative of (unknown) system- 
atic errors in the earlier data. The HZSST data are not in- 
cluded in the sample used here. 

The Evidence ratios for a cosmological constant and the 
constant wo model are listed in Table 1 for various choices 
of prior on wo- The results in Table 1 agree well with those 
of Liddle et al. (2006) who used the distant supernovae data 
of Astier et al. (2006). The Evidence ratios in Table 1 are 
about a factor of two larger than those computed by Serra 
et al. (2006). Much of this difference is caused because the 
latter authors include the old HZSST data which pull the 
likelihood function further into the phantom regime hence 
penalising the A model. 

The Evidence ratios in this Table indicate a marginal 
preference for the A model (see Section 3.1 for remarks on 
the interpretation of Evidence), but none of the Evidence 
ratios are high and it is easy to change them by factors 
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Figure 1. Constraints on wo and Q m from the data (primarily 
distant supernovae) described in the text. The ellipses show 1, 
2 and 3<x contours of the marginalized likelihood function. The 
maximum of the likelihood function is shown by the cross. 

Table 1: Evidence Ratios 



Model Prior E a /Eq \t\{E a /Eq) 



constant w -1 < w < -0.333 3.42 1.23 

constant w -2 < w < -0.333 6.15 1.82 

constant w -1.4 < w < -0.6 2.95 1.08 

linear < V < 3 1.58 0.46 

linear < V < 6 2.59 0.95 

inverse power < a < 6 8.59 2.15 

inverse power < a < 2 2.78 1.02 

inverse power < a < 1 1.40 0.33 



Note: E A denoted the Evidence for a model with a cosmologi- 
cal constant A and Eq denotes the Evidence for the dynamical 
dark energy models discussed in the text. The first three lines list 
Evidence ratios for the model with constant wo- The next two 
lines list Evidence ratios for the model with a linear potential as 
discussed in the text. The last three lines list Evidence ratios for 
tracker models with an inverse power-law potential. 

of a few by changing the range of the prior on wo- This 
is demonstrated in the third line of the Table, where the 
Evidence has been recomputed assuming a flat prior over 
the narrower range —1.4 < wo < —0.6. The dependence on 
the prior is not particularly serious in this case, because none 
of the entries in Table 1 provide strong evidence to favour 
or disfavour the A model. 

Following on from the discussion in Section 2.1, we 
would argue that a model with a flat prior on a constant 
value of wo is not particularly well motivated. We therefore 
seek a simple dynamical model, derivable from a potential 
V((j)), with as few free parameters as possible. One such 
model is based on the linear potential^ , 

T Note that the dimensionless parameters appearing in this equa- 
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Figure 2. The Figure to the left shows regions contours of fixed values of f2^ and ui^ for the linear 'thawing' model discussed in the 
text. No solutions are possible in the shaded regions. The Figure to the right shows the likelihood function determined from the data 
discussed in the text after marginalising over the Hubble parameter. The contours delineate 1, 2 and 3cr confidence regions. 



V{<j>) = V + V'<t>, 



with a negative gradient V' . (For discussions of the linear 
potential see e.g. Dimopoulos and Thomas 2003; Kallosh el 
al. 2003; Avelino 2005.) The zero point of the field value has 
no significance and so can be set to zero at some starting red- 
shift Zi. The field is locked by Hubble friction (equation 3) 
until the Hubble parameter drops sufficiently that the field 
begins to roll. At this point, the dark energy will show in- 
teresting dynamical behaviour. Eventually, the potential will 
become negative and the Universe will collapse. This simple 
model therefore displays 'thawing' behaviour in the nomen- 
clature of Caldwell and Linder (2005), followed by 'cosmic 
doomsday' in the nomenclature of Kallosh et al. (2003). 

This model has a weak dependence on the starting red- 
shift, which we fix to be the decoupling redshift Zi = 1089. 
We 'absorb' this weak dependence into the priors on Vb and 
V' . A model is therefore specified by these two parameters. 
The left hand panel of Figure 2 shows contours in the Vb — V 
plane with constant values of fi</> (the present day dark en- 
ergy density) and w,/, (the present day dark energy equa- 
tion of state parameter). The shaded regions delineate areas 
where no solution exists. Figure 1 shows that the data favour 
models with Q m ~ 0.27 and wq ^ — 0.85. The marginalised 
likelihood function for the linear model is therefore expected 
to delineate a narrow banana centred around the f^ ~ 0.7 
line. This is indeed what is found, as illustrated in the right 
hand panel of Figure 2. 

To compute the Evidence, priors need to be specified 
for the parameters Vb and V' . But how do we choose these 
priors? For spatially flat models with V' = 0, the value of Vb 
must lie within the range < Vb < 3. However, if we allow 
non-zero values of V' , Vb can lie outside this range. One 
possibility would be to choose a flat prior over the region 



tion are related to dimensional parameters as (j>/M p i 



Vb and V'M' 1 



V. 



of the Vb — V plane corresponding to models which are 
accelerating at the present day. However, this choice of prior 
is hardly compelling. 

In fact, the choice of prior is not particularly critical for 
the interpretation of Figure 2 because current data provide 
relatively poor constraints on V' . This is illustrated by the 
last two lines in Table 1 which list the Evidence ratios as- 
suming a uniform prior in V' over the full range shown in 
Figure 2 (0 < —V < 6) and a uniform prior in Vb over the 
ranges < Vb < 3 and < Vb < 6 (excluding the hatched 
regions). There is no significant evidence to favour A over 
the dynamical model, and it is clear from inspection of Fig- 
ure 2 that the higher Evidence ratio in the last line of Table 
1 is largely a consequence of the increased range of the prior 
on Vb. 

As a final example, we consider a 'tracker' model with 
the Ratra-Peebles (1998) potential 

M 4+a 

m - —■ (8) 

An introductory review of this model is given by Martin 
(2008). As mentioned above, this potential has an attractor 
solution which drives w^, towards the solution 

aw B - 2 

w <t>^ T~5 ' ( 9 ) 

while the scalar field is subdominant (where wb is the equa- 
tion of state parameter of the background matter). This 
model is an example of a 'freezing' model in the terminology 
of Caldwell and Linder (2005) . We set (f> = at high redshift 
(z = 10000) and choose the initial value of <j) so that the field 
locks on to the attractor solution without overshoot. The low 
redshift behaviour is therefore fixed by the attractor solu- 
tion so the model is characterised by the two parameters M 
and a defining the potential (8). 

The marginalised likelihood for this model is shown in 
Figure 3, (using the same observational data as for Figures 
1 and 2). Evidently, the data constrain the power-law index 
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Figure 3. Constraints on the parameters of the inverse power 
law potential a and M. As in Figure 1, the ellipses show 1, 2 and 
3c contours of the marginalized likelihood function. 

to be a 1. However, from the theoretical point of view, 
there are no compelling constraints on the spectral index; 
values as high as a — 6 or more have been discussed in 
the literature (Steinhardt, Wang and Zlatev 1999, Martin 
2008) and occasionally promoted as a possible solution to 
the 'hierarchy' problem associated with the mass scale M. 
The Evidence ratios for various assumed prior ranges in a 
are listed in the last three lines of Table 1. As expected, 
these scale almost perfectly with the width of the prior range 
assumed for a. Nevertheless, none of the Evidence ratios 
in Table 3 are high and so one cannot reject the potential 
(8) drawn from a broad uniform prior on a. However, it 
is obvious from the likelihood function shown in Figure 3 
that we can rule out the potential (8) for specific choices of 
a £ 1.5. 



3 COMMENTS ON THE APPLICATION OF 
MODEL SELECTION 

3.1 The Jeffreys Scale 

The previous Section shows that current data, unfortu- 
nately, provide relatively weak constraints on simple models 
of dynamical dark energy. The highest Evidence ratios are 
about AlnE ~ 2 for certain choices of prior. Plausible vari- 
ations on the prior of a single parameter can easily change 
the Evidence ratio by Alni? ~ 1 or more. Many papers on 
cosmological model selection have adopted the interpretive 
scale suggested in Appendix B by Jeffreys, which in the view 
of this author is not sufficiently conservative. Trotta (2008) 
presents a revised interpretative scale in his review, which 
accords with ones intuitive assessment of the relative poste- 
rior odds, B12, of two models. The two scales are compared 
in Table 2. 

For the dark energy examples summarized in the pre- 
vious Section, if the observational data improve to give 
AIh(Ea/Eq) 5, then it would be reasonable to conclude 
that there is evidence favouring a cosmological constant. The 



Bayesian Evidence 5 

Table 2: Interpretive scales 





Jeffreys grades 




Trotta (2008) 


lnBi2 


strength of Evidence 


lnBi2 


strength of Evidence 


< 1.15 


not worth a mention 


< 1.0 


inconclusive 


1.15 - 2.3 


substantial 


1.0 - 2.5 


weak 


2.3 - 4.6 


strong to very strong 


2.5 - 5.0 


moderate to strong 


> 4.6 


decisive 


> 5 


strong 



data are then providing strong enough constraints to over- 
whelm the changes in the prior volumes illustrated in Table 
1 (see Section 3.3 below). But Evidence ratios of AlniJ ~ 2 
are clearly too small to achieve this. Many of the problems 
outlined in the previous Section can be overcome by adopt- 
ing a conservatively high threshold before claiming strong 
evidence against particular classes of model. The threshold 
may need to be set higher than Alni5 = 5 if one (or both) 
of the models involves several additional parameters with 
uncertain priors. 

3.2 Stating and Varying Priors 

It is essential that authors computing Bayesian Evidence 
state their priors carefully since these are an integral part of 
the definition of a model. If there are no compelling reasons 
to guide the choice of priors, then one should demonstrate 
that the data overwhelm plausible variations in priors be- 
fore drawing any strong conclusions on particular classes 
of model. This has not been common practice in the liter- 
ature. For example, Table 4 of Trotta (2008) summarizes 
Evidence calculations for various cosmological model com- 
parisons. Of the ten entries testing dynamical dark energy 
agains A, only three explore variations in priors. One analy- 
sis quoted in this Table (Bassett, Corasaniti and Kunz 2004) 
computes Evidence for simple parameterizations of w (such 
as w — wo + wiz) without stating the prior ranges on the 
parameters. These authors find AEa/Eq ^,5 — 6, suggest- 
ing strong evidence favouring A. Such high Evidence ratios 
are clearly at variance with the conclusions of Section 2. 

3.3 Model Selection compared with Parameter 
Estimation 

In many cosmological applications of model selection, we are 
dealing with highly nested problems. In each of the examples 
discussed in Section 2, the model for dynamical dark energy 
tends to the A model as one additional the parameter tends 
to zero (wo + 1 — > 0, V' — > 0, a — > 0). In each of these 
cases, the primary question of interest is whether there is any 
empirical evidence that an additional parameter, A differs 
from zero. For such highly nested problems^ the Bayes factor 
for model Mi (A = 0) and model M 2 (A drawn from a prior 
distribution 7r(A)) is simply 

B P(D\X = 0M 2 ) 

12 / P(D\\M 2 )n(\\M 2 )d\' { ' 

§ To simplify the following discussion, we will assume uniform 
priors on all parameters. 
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Table 3: Likelihood ratio compared to Evidence ratio 



ln(E/±/ Eq) uniform prior 



(7 


ln£(-l)/£(tu,) 


-1 < w < -1/3 


-1 < w < -( 


0.1 


-0.03 


1.22 


-0.084 


0.05 


-0.50 


1.77 


-0.021 


0.02 


-3.12 


-0.53 


-2.42 


0.015 


-5.55 


-2.67 


-4.58 


0.01 


-12.50 


-9.22 


-11.11 



As a specific example ™ , imagine that future dark energy 
surveys find to, = —0.95 ± a (wo = (1 — A) in the notation 
used above) and we test model 1 (wo = — 1, the 'null hypoth- 
esis') against model 2 (uniform prior in too). Table 3 lists the 
likelihood ratios and Bayes factors for two choices of prior. 
One can see that for a — 0.01 the data swamp the depen- 
dence on the prior and all three entries in Table 3 strongly 
disfavour u>o = — 1. The likelihood ratio is as informative as 
the Evidence ratios in the exponentially dominated regime 
since it matters little whether the posterior odds of two mod- 
els are ~ 10~ 6 or ~ 10~ 10 - the odds are negligible in either 
case . The case of a = 0.015 is more interesting and is an 
example of 'Lindley's paradox' (Lindley 1957). The likeli- 
hood function indicates a 3.33<r discrepancy with the null 
hypothesis, yet the Evidence ratio in the third column of 
Table 3 suggests weak evidence against wo = —1. This is 
simply because the Evidence ratio compares one model that 
is disfavoured by the data (wq = — 1) against another model 
that is disfavoured by the data (uniform distribution of A 
over the range — 1 < w < —1/3, requiring fine tuning at 
the few percent level). Lindley's paradox should not obscure 
the fact that the likelihood peaks away from wo = — 1: the 
likelihood function suggests that wo differs from —1. and is 
therefore informative. If, following future experiments, the 
contours shown in Figures 1-3 tighten so that zero values 
for 1 + wo, V , or a, are exponentially suppressed, then we 
will have very strong evidence in favour of dynamical dark 
energy, irrespective of priors. If the Evidence ratios for rea- 
sonable choices of priors are still in the 'ambiguous' range 
IuEa/Eq ~ 2.5 — 5, a modest improvement of the the data 
could potentially render the issue decisive. 

Finally, let us consider the case relevant to equation 
(15), namely that the likelihood function peaks at A ~ to 
within ~ a. The null hypothesis is then favoured if the prior 
range A max >• a. But the Evidence ratio can be reduced 
to unity by adjusting the prior range to be of order a (c.f 
equation 14). An 'Occam's Razor' penalty for a model with 
additional parameters can be realised only if: (a) one has 
good arguments for choosing the prior ranges of the addi- 
tional parameters and (b) the likelihood function is compact 
with respect to these prior ranges. 



3.4 When is Bayesian Evidence Particularly 
Useful? 

In the previous sub-section we have argued that the 
marginalised likelihood function is informative and can pro- 
vides a good indicator of whether the null hypothesis (A) is 
disfavoured compared to dynamical energy models. Let us 
suppose that A is indeed disfavoured by future data. In this 
case, the likelihood contours in Figures 1-3 would break up 
into 'islands' peaked away from A. How do we assess be- 
tween these three parameterizations? Bayesian Evidence is 
likely to be indispensible for this type of 'non-nested' model 
comparison. Again, the usual caveats should apply: 

(a) we should aim to compare physically well-motivated 



» We could equally as well have used the example of the spatial 
curvature Qj. or deviation of the scalar spectral index n s from 
unity, as discussed by Liddle et al. (2007). 



(this is related to the Savage-Dickey density ratio, see Trotta 
2007) where P(L>|AM 2 ) = £(A) is the likelihood function 
marginalised over all of the common parameters) . For a uni- 
form prior in A, £(A) is just the marginalised posterior dis- 
tribution of A on M2, and from (10) we can interpret the 
likelihood ratio C(0)/C{\) as the Bayes factor for two mod- 
els with delta function priors centred at A = and A. If 
we choose A = A* , where A* corresponds to the peak of the 
likelihood function, then the Bayes factor is minimised. (For 
further discussion of the relationship between likelihood ra- 
tios and Bayes factors see Gordon and Trotta, 2007.) Now 
suppose that the likelihood function is approximated by a 
Gaussian 

£(A)=£ expf- (A ~y 2 Y (11) 



2<7 2 

and that we are interested in whether the parameter A differs 
from zero. The likelihood ratio £(0)/£(A») is evidently 



c{\) ~ 1 v 2tj2 

Now assume that under model 2, the parameter A is drawn 
from a uniform distribution in the range < A < Amax, 
and assume further that A max >• A» >• a. The data is then 
'informative' (A max >• a) and suggestive that A deviates 
from zero (A* >• a). In this case, 

E(Q) ( \l \ Vax .... 

If the exponential term dominates in (13), the Evidence ratio 
is exponentially suppressed and the data swamp the depen- 
dence on the prior. (In fact, in testing whether a parameter 
differs from zero, Jeffreys (1961, §5.2) suggests using the 
prior 

" (a)oc ( i + aW (14) 

since 'there is nothing in the problem except a to give a scale 
for A'. In other words, the choice of prior is driven by the 
data and in this case one is reliant on the exponential factor 
(13) to reject the null hypothesis.) Alternatively we might 
find A* ^ a suggesting that the parameter A is consistent 
with zero, with A max ^> a, in which case 

^(O^J^A^ 

E(\) ~ ^ a ' (la> 

which merely tells that under model 2 we need fine tuning 
of order <r/A max to explain the data. 



© 0000 RAS, MNRAS 000, 000-000 



Bayesian Evidence 7 



models (better motivated than the skater model of Section 
2.1); 

(b) compare models with as few free parameters as possible 
(cf the models of Section 2.2) to limit the sensitivity of the 
Evidence to prior volumes; 

(c) explore variations in the priors. 



4 CONCLUSIONS 

Bayesian inference has been applied widely for parameter 
estimation from cosmological data and is relatively uncon- 
troversial. The Bayesian framework can easily be extended 
to model selection, but this has proved to be more contro- 
versial. 

There is nothing wrong with the mathematical frame- 
work underlying Bayesian model selection. It is the perceived 
usefulness of the framework, given difficulties in specifying 
models that is at the source of the controversy. It is im- 
portant to recognise that Bayesian model selection differs 
in a fundamental way from Bayesian parameter estimation. 
In parameter estimation, the posterior distribution on a pa- 
rameter is useful because it usually become narrower as the 
quality of constraining data improves. The sensitivity of the 
posterior distribution to the prior therefore often dimin- 
ishes dramatically with better data. This is why Bayesian 
parameter estimation has proved relatively uncontroversial. 
(Surprisingly so, since in cases such as estimating the CMB 
quadrupole there is an irreducible sensitivity to the choice 
of prior, see Efstathiou 2003) . 

For Bayesian model selection we need to apply 'physical 
intuition' to select suitable models and the priors on model 
parameters. Once these are chosen, the data determine the 
numerical value of the Evidence (i.e. the probability of the 
data given the model) via equation (1). There is no 'up- 
dating' involved since the data return a single value of the 
Evidence given the model. In Section 2, we discussed some of 
the difficulties associated with defining physically well mo- 
tivated models and parameter ranges. Now one can argue, 
correctly, that the range of a parameter is part of the defini- 
tion of a model. However, in cosmology, it is often difficult to 
provide compelling arguments in favour of a particular pa- 
rameter range. This is certainly the case for the dark energy 
tests described in this paper. If we use the data to suggest 
the parameter ranges, for example, by examining the likeli- 
hood function, then a computation of (1) will be of limited 
value since the probability of the data given the model will 
be high by construction. 

Many applications of Bayesian Evidence to cosmology 
involve highly nested problems in which the primary ques- 
tion of interest is whether a key parameter, A, differs from 
zero (the 'null' hypothesis). For such problems, we have ar- 
gued that the marginalized likelihood function £(A) is more 
informative than Evidence computed for specific, and often 
poorly motivated, choices of priors. If the likelihood func- 
tion is exponentially suppressed at A = 0, then we can con- 
clude that there is strong evidence against the null hypoth- 
esis for any reasonable choice of priors. Bayesian Evidence 
is of most use is in comparing non-nested models. If we are 
in the happy situation of having high quality data that rule 
out a cosmological constant, then Bayesian Evidence can be 
used to select between various dynamical models. But the 



Evidences will only be of interest if the models and priors 
are physically well motivated. 

Finally, we re-iterate that the Evidence calculations pre- 
sented in Table 1 show no significant evidence in favour of 
a cosmological constant compared to the dynamical mod- 
els of dark energy tested here. This conclusion agrees with 
similar Evidence analyses of Serra et al. (2007) and Liddlc 
et al. (2007), using different models and somewhat different 
data, but disagrees with the Evidence analysis of Bassett et 
al. (2004). Several recent papers have used the Bayesian In- 
formation Criterion (BIC) to claim that a cosmological con- 
stant is favoured over dynamical dark energy ( Bassett et al. 
2004, Davis et al. 2007; Sahlen et al. 2007; Kurek and Szyd- 
lowski 2007). However, BIC unfairly penalises models with 
many parameters if these parameters are poorly constrained 
by the data (Liddle 2004; Liddle 2007). If this paper is a 
'health warning' concerning the use of Bayesian Evidence in 
cosmology, it should be considered a 'death certificate' on 
the use of approximations such as BIC if the strict criteria 
for their applicability are not met (see Liddle 2007 for fur- 
ther details). As the models of Section 2 show, current data 
unfortunately provide relatively weak constraints on simple 
dynamical models of dark energy. 
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